Public T-Cell Receptors (TCRs) Revisited by Analysis of the Magnitude of Identical and Highly-Similar TCRs in Virus-Specific T-Cell Repertoires of Healthy Individuals

Since multiple different T-cell receptor (TCR) sequences can bind to the same peptide-MHC combination and the number of TCR-sequences that can theoretically be generated even exceeds the number of T cells in a human body, the likelihood that many public identical (PUB-I) TCR-sequences frequently contribute to immune responses has been estimated to be low. Here, we quantitatively analyzed the TCR-repertoires of 190 purified virus-specific memory T-cell populations, directed against 21 epitopes of Cytomegalovirus, Epstein-Barr virus and Adenovirus isolated from 29 healthy individuals, and determined the magnitude, defined as prevalence within the population and frequencies within individuals, of PUB-I TCR and of TCR-sequences that are highly-similar (PUB-HS) to these PUB-I TCR-sequences. We found that almost one third of all TCR nucleotide-sequences represented PUB-I TCR amino-acid (AA) sequences and found an additional 12% of PUB-HS TCRs differing by maximally 3 AAs. We illustrate that these PUB-I and PUB-HS TCRs were structurally related and contained shared core-sequences in their TCR-sequences. We found a prevalence of PUB-I and PUB-HS TCRs of up to 50% among individuals and showed frequencies of virus-specific PUB-I and PUB-HS TCRs making up more than 10% of each virus-specific T-cell population. These findings were confirmed by using an independent TCR-database of virus-specific TCRs. We therefore conclude that the magnitude of the contribution of PUB-I and PUB-HS TCRs to these virus-specific T-cell responses is high. Because the T cells from these virus-specific memory TCR-repertoires were the result of successful control of the virus in these healthy individuals, these PUB-HS TCRs and PUB-I TCRs may be attractive candidates for immunotherapy in immunocompromised patients that lack virus-specific T cells to control viral reactivation.


INTRODUCTION
Human virus-specific CD8 pos T cells express heterodimeric alpha (a)/beta(b) TCRs that can specifically recognize viral peptides presented by HLA-class-I molecules (1). The TCRa-and the TCRb-chain repertoires are highly variable due to the genetic recombination process involved in their generation. For the TCRb-chains, recombination of 1 of 48 functional T-cell Receptor Beta Variable (TRBV), 1 of 2 functional T-cell receptor Beta Diversity (TRBD) and 1 of 12 functional T-cell Receptor Beta Joining (TRBJ) gene segments leads to a V-D-J reading frame (2). The TCRa-chains are generated by a similar recombination process with the exception of a diversity gene, resulting in a V-J reading frame (3). Insertion of templateindependent nucleotides between the recombined segments (junctional region) results in a significant further increase in variability (4). The sequence around these junctions encodes for the Complementary Determining Region 3 (CDR3), a loop that reaches out and interacts with a peptide embedded in an HLA molecule, together with the loops of the CDR1 and CDR2 regions, which are fixed within the germline variable gene sequence (5,6). It has been calculated that these gene rearrangements could potentially generate a repertoire of 10 15 -10 20 unique TCRs that may interact with all possible peptide-HLA complexes (7).
Pathogenic viruses like Cytomegalovirus (CMV), Epstein-Barr virus (EBV) and Adenovirus (AdV) can infect humans for life by staying latently present in target cells after a primary infection. In healthy individuals these latent viruses are controlled by the virus-specific T cells. As a result, reactivations of these latent viruses are observed frequently, but do not result in severe virus-associated disorders like malignancies and/or organ failure. However, in the absence of a competent immune system, these latent viruses remain uncontrolled and are associated with high-morbidity and mortality in immune-compromised patients, including patients after stem cells or organ tranplantation (8,9). To control these viruses, antigen-experienced (central-memory and effectormemory) virus-specific T cells have to develop from the naïve T-cell repertoire. Due to the high diversity of the naïve T-cell repertoire (10), T-cell responses against the many potential viral epitopes presented in multiple HLA alleles may be composed of a large variety of different TCRs. Indeed, when naïve umbilical cord blood-derived T cells were stimulated in vitro to generate de novo responses against proteins from CMV or Human Immunodeficiency Virus (HIV), this resulted in responding virus-specific T-cell populations with a highly diverse repertoire of TCRs, recognizing many different CMV (11) or HIV-derived peptides (12). However, from ex vivo analyses in adults it became clear that in vivo the virus-specific memory Tcell populations are shaped during control and clearance of the infection and target only a limited number of viral-peptides, as was shown for T-cell populations specific for viruses like CMV (13), EBV (14), AdV (15), Influenza A (16) and also more recently SARS-Cov-2 (17,18). Nevertheless, the multiple viralpeptides that are targeted in the various HLA alleles make it theoretically unlikely that individuals would frequently share exactly the same virus-specific TCR, unless T cells expressing certain TCRs would favor control of infections and would therefore dominate the responses. Since latent viruses like CMV, EBV and AdV are not fully eradicated, reactivations are frequent and trigger the expansion of antigen-experienced virusspecific memory T cells. This unique biology might contribute to the favor and skewing of specific TCRs expressed by T cells that dominate the response and control the virus.
Evidence for selection of certain virus-specific TCRexpressing T cells in controlling viruses has come from several reports identifying identical TCR amino-acid (AA) sequences in dominant virus-specific memory T-cell populations in different individuals, designated as public TCR-sequences [from here on referred to as public-identical (PUB-I) TCR-sequences]. Most studies investigated the presence of PUB-I TCR-sequences in TCRb-chains, since this is the most diverse TCR chain and the CDR3b sequence of CD8 T cells is positioned to interact with the antigenic peptide presented by HLA-class-I molecules. However, dominant virus-specific memory T-cell populations with PUB-I TCRa chains have also been described previously (19). Such PUB-I TCR sequences are most overtly observed in antigenexperienced memory virus-specific T cells due to the in vivo antigen-driven proliferation, but are also present within the naïve T-cell compartment, although at low frequency (20). PUB-I TCRb sequences have been found in T-cell populations specific for latent viruses like CMV-specific T-cell responses (21)(22)(23), EBV-specific T-cell responses (24,25), but also for nonlatent viruses like Influenza-specific T-cell responses (16), respiratory syncytial virus-specific T-cell responses (26) and SARS-Cov-2-specific T-cell responses (17,27). In addition, some of these virus-specific T-cell populations also contained TCR AA-sequences that were highly-similar to the identical shared TCR AA-sequence [from here on referred to as highlysimilar to PUB-I (PUB-HS) TCR-sequences]. However, the magnitude, defined as prevalence within the population and frequencies within virus-specific T-cell repertoires, of PUB-I and PUB-HS TCR-sequences is not known. A high probability to be generated during V-D-J recombination may play a role (28), but since virus-specific memory T-cell repertoires in the circulation are shaped based on antigen encounter and subsequent proliferation, the PUB-I and PUB-HS TCRsequences most likely reflect highly functional T cells capable of antigen-driven proliferation.
We hypothesize that frequent induction of antigen-driven proliferation of virus-specific T cells targeting frequently reactivating latent viruses will increase the prevalence and frequencies of PUB-I and PUB-HS TCR-sequences within the repertoire of antigen-experienced virus-specific T-cells. Molecular analysis of these TCRs will add in the analysis of the development, presence and quality of memory T-cell responses, and tracking of virus-specific T-cell responses. Furthermore, Identification of dominant TCRs with shared core-sequences may be utilized for the design of future immunotherapy purposes including TCR-gene transfer. Therefore, the aim of our study was to quantitively analyze the magnitude of PUB-I and PUB-HS TCRb-sequences within the antigen-experienced virus-specific TCR-repertoires of CMV, EBV and AdV-specific CD8 pos memory T cells. We confirmed that healthy individuals generate many different virus-specific TCRs, illustrated by the >3000 TCR nucleotide-sequences that were found ex vivo in virus-specific memory T-cell populations. However, a significant part of the virus-specific TCR-repertoires contained PUB-I and PUB-HS TCR nucleotide-sequences. The AAs of these PUB-HS TCRs varied on specific positions in the CDR3b-region, while maintaining a conserved core-AA-sequence that was also present in the respective PUB-I TCR. We identified conserved TCR core-AA-sequences for each specificity that could be used for diagnostic purposes looking at anti-viral immune responses. Additionally, PUB-I or PUB-HS TCRs with the highest frequencies in healthy individuals may be utilized to develop off-the-shelf immunotherapeutics (using TCR-gene transfer) to effectively control CMV, EBV or AdV-infections or reactivations in immunocompromised patients.

Collection of Donor Material
After informed consent according to the Declaration of Helsinki, healthy individuals (homozygously) expressing HLA-A*01:01 and HLA-B*08:01 or HLA-A*02:01 and HLA-B*07:02 were selected from the Sanquin database and the biobank of the department of Hematology, Leiden University Medical Center (LUMC). Peripheral blood mononuclear cells (PBMCs) were isolated by standard Ficoll-Isopaque separation and used directly or thawed after cryopreservation in the vapor phase of liquid nitrogen. Donor characteristics (HLA typing, CMV and EBV serostatus) are provided in Table 1. For isolation of donorderived virus-specific T cells using fluorescence-activated cell sorting (FACS, gating strategy see Supplementary Figure 1) with pMHC-tetramers ( Table 2) and expansion of donorderived virus-specific T cells, see Supplementary Material and Methods.

TCRb-Library Preparation
TCRb-sequences were identified using ARTISAN PCR adapted for TCR PCR (29,30). Total mRNA was extracted from 190 pMHC-tetramer pos purified (Supplementary Figure 2A) virusspecific T-cell populations (31) using magnetic beads (Dynabead mRNA DIRECT kit; Invitrogen, Thermo Fisher Scientific). Ten µl (~1µg) of mRNA per sample was mixed with TCRb constant region-specific primers (1mM final concentration) and SmartSeq2modified template-switching oligonucleotide (SS2m_TSO; 1mM final concentration) and denatured for 3 minutes at 72°C. After cooling, cDNA was synthesized for 90 minutes at 42°C with 170 U SMARTscribe reverse transcriptase (Takara, Clontech) in a total volume of 20µl containing 1.7U/ml RNasin (Promega), 1.7mM DTT (Invitrogen, Thermo Fisher Scientific), 0.8mM each of high-purity RNAse-free dNTPs (Invitrogen, Thermo Fisher Scientific) and 4µl of 5x first-strand buffer. During cDNA synthesis, a non-templated 3'polycytosine terminus was added (Supplementary Figure 2B), which created a template for extension of the cDNA with the TSO (32). PCR (2min at 98°C followed by 40 cycles of [1s at 98°C, 15s at 67°C, 15s at 72°C], 2 min at 72°C) of 5µl of cDNA was then performed using Phusion Flash (Thermo Fisher Scientific) with anchor-specific primer (SS2m_For; 1mM final concentration) and each (1mM final concentration) of the nested primers specific for the constant regions of TCRb constant 1 and TCRb constant 2. Both forward and reverse PCR primers contained overhanging sequences suitable for barcoding. Amplicons were purified and underwent a second PCR (2min at 98°C followed by 10 cycles of [1s at 98°C, 15s at 65°C, 30s at 72°C], 2 min at 72°C) using forward and reverse primers (1mM final concentration) with overhanging sequences containing identifiers (sequences of 6 base-pairs) and adapter sequences appropriate for Illumina HiSeq platforms (or PacBio; Pacific Biosciences). Unique identifiers were used for each T-cell population targeting one epitope. Forward or reverse identifiers were shared between T-cell populations targeting different epitopes. For all primer sequences see Supplementary Table 1. For identifier sequences see Supplementary Table 2. Amplicons with identifiers were purified, quantified and pooled into one library for paired-end sequencing of 150bp on an Illumina HiSeq4000. Deep sequencing was performed at GenomeScan (Leiden, The Netherlands). Raw data were de-multiplexed and aligned to the matching TRBV, TRBD, TRBJ and constant (TRBC) genes. CDR3b-sequences were built using MIXCR software using a bi-directional approach (5'-3' and 3'-5' read) (Supplementary Figure 2C) (33). CDR3b-sequences with a stop-codon were removed from the library. Bi-directional readings using MIXCR could result in out-of-frame CDR3b AA-sequences due to the even number of nucleotides. These sequences (n=392) were manually aligned with the germline TRBV and TRBJ-sequence. CDR3b-sequences were further processed using custom scripts in R to compare specificities and sharing of CDR3b-sequences.

Computational Unbiased Repertoire Analysis
The following R-packages were used in R-software to generate a nodal plot of CDR3b AA-sequences with the levenshtein distance as parameter for similarity: "igraph" to create network objects, obtain the degree of a node and its betweenness (34), "data.table" to organize CDR3b-sequences; "stringdist" to calculate Levenshtein distances (35), "Biostrings" for fast manipulation of large biological sequences or sets of sequences (36), "dplyr" to arrange and filter data (37), "tibble" for providing opinionated data frames, "ggplot2" for generating figures (38) and "RColorBrewer" to create graphics (39). A levenshtein distance of 0.25 was added to visualize multiple identical sequences. Nodes with identical sequences (levenshtein distance of 0.25) were manually replaced by pie-charts using Adobe Illustrator CC 2018.

Sequence Logo Plots
To identify which positions of PUB-I and PUB-HS CDR3b AAsequences were conserved and which were variable, all CDR3b AA-sequences with the most frequent CDR3 length were included and the AAs were stacked for each position in the sequence. The overall height of the stacks indicates the sequence conservation at that position, while the height of symbols within the stacks indicates the relative frequency of each AA at that position. AAs have colors according to their chemical properties; polar AAs (G, S, T, Y, C, Q, N) show as green, basic (K, R, H) blue, acidic (D, E) red, and hydrophobic (A, V, L, I, P, W, F, M) AAs as black (40).

Generation of Independent TCR-Database From Virus-Specific T-Cell Products Generated for a Clinical Trial
As an independent TCR-database containing TCR-sequences from virus-specific T cells, we used the information obtained from the virus-specific T-cell products generated in the context of the phase I/II safety and feasibility study T Control (EudraCTnumber 2014-003171-39) using the MHC-I-Streptamer isolation technology (Juno Therapeutics, Munich, Germany) (20,41). Sequencing was performed as described above for all virusspecific T-cell populations per donor, resulting in unique identifiers for all virus-specific T-cell populations in the TCRb-library.

Generation and Validation of a Library of TCR-Sequences Derived From FACsorted Virus-Specific T-Cell Populations
To examine the composition of the virus-specific TCRrepertoires in different individuals, the CDR3b-regions of purified expanded pMHC-tetramer-binding virus-specific Tcell populations were sequenced (Supplementary Figure 2). We analyzed the TCR-repertoires of CMV, EBV and AdVspecific T cells, restricted to four prevalent HLA alleles (HLA-A*01:01, HLA-A*02:01, HLA-B*07:02 and HLA-B*08:01) and specific for 21 different peptides ( Table 2). Purified CMV, EBV and AdV-specific T-cell populations targeting CMV (n=8), EBV (n=10) or AdV (n=3)-derived peptides were isolated from 17 HLA-A*01:01/B*08:01 pos individuals and 12 HLA-A*02:01/ B*07:02 pos individuals ( Tables 1, 2). In total, 190 virus-specific T-cell populations, each targeting a single viral epitope, were successfully isolated and showed high purity (>97% pMHCtetramer positive). The mean precursor frequencies of the different T-cell specificities in the starting PBMC materials are shown in Supplementary Figure 3. Sequencing of the CDR3bregions of these virus-specific T-cell populations resulted in 3346 CDR3b nucleotide-sequences that occurred at frequencies of more than 0.1%. 135 of these nucleotide-sequences were present at high frequencies (>5%) within one specificity, but were also found at low frequencies (around 0.1%) in another specificity, indicating contamination due to FACSorting impurities. These low frequency nucleotide-sequences were discarded from further analysis. 41 nucleotide-sequences were present at low frequencies in two unrelated specificities and could not be correctly annotated and these 41 duplicates (n=82) were also discarded from the analysis. Therefore, a total of 3129 nucleotide-sequences could be annotated. In total, 2224 (71%) of these nucleotide-sequences represented unique CDR3b AAsequences that were found in only one individual and 905 nucleotide-sequences (29%) resulted in 131 different PUB-I CDR3b AA-sequences that were found in two or more unrelated individuals (Flowchart; Figure 1). To investigate the relationship between the numbers of CDR3b nucleotide-sequences and the translated number of CDR3b AA-sequences, we compared the nucleotide-sequences of the 131 PUB-I CDR3b AA-sequences found in different individuals. Different nucleotide-sequences can result in the same CDR3b AA-sequence, a phenomenon known as convergent recombination. We found that PUB-I CDR3b AAsequences present at high (representative example; Figure 2A) or low frequencies (range 0.1-1%) (representative example; Figure 2B Figure 4). Because the majority of nucleotide-sequences encoding the same CDR3b AA-sequences were different between individuals, these data exclude contamination as an explanation for the finding of PUB-I CDR3b AA-sequences.

PUB-I and PUB-HS CMV-, EBV-and AdV-Specific CDR3b AA-Sequences Are Abundant in Virus-Specific T-Cell Populations
We then investigated the distribution of the 131 PUB-I CDR3b AA-sequences within the 21 different specificities and the prevalence among individuals for each of the PUB-I CDR3b AA-sequences per viral-epitope. T cells with PUB-I CDR3b AAsequences were found for 19 out of the 21 specificities (Supplementary Table 3). PUB-I CDR3b AA-sequences were not observed in AdV-IE1 LLD and EBV-LMP2 ESE -specific T-cell populations. Some T-cell populations (e.g. EBV-LMP2 FLY ) contained many different PUB-I CDR3b AA-sequences (n=24) that were all highly-similar. For this reason, we investigated the distribution of PUB-I CDR3b AA-sequences with unique TRBV and TRBJ-gene usage. This resulted in 29 different PUB-I CDR3b AA-sequences, distributed over 19 specificities ( Figure 3A; grey bars). Six specificities contained two or three different (expressing different TRBV and/or TRBJ-genes) PUB-I CDR3b AA-sequences that were highly prevalent among individuals. To investigate how often these PUB-I CDR3b AA-sequences could be found in our cohort of healthy donors, we quantified the prevalence of each of these 29 PUB-I CDR3b AA-sequence ( Figure 3A; grey bars). Because we classified a PUB-I CDR3b AA-sequence as being present in at least 2 individuals, the prevalence among donors could not be less than 2 out of 17 (12%; 17 is maximum number of T-cell populations for 1 specificity). Only 4 out of 29 PUB-I CDR3b AA-sequences were found in only 2 individuals ( Figure 3A; grey bars). Overall, these 29 PUB-I CDR3b AA-sequences had a prevalence of 33% among healthy individuals (median; range 12%-82%). Importantly, most PUB-I CDR3b AA-sequences were found in at least 25% of individuals and 5 were even present in more than half the donors. We and others hypothesized that the binding/docking of TCRs to HLA-peptide complexes might allow for small changes/ flexibility in the CDR3 AA-sequences without significantly changing the conformation or interaction (42). Therefore, we investigated if there were CDR3b AA-sequences present in our data set that were highly-similar (PUB-HS) to PUB-I CDR3b AAsequences and differed by 1, 2 or 3 AAs. The 2224 unique TCR nucleotide-sequences identified in our previous analysis may contain PUB-HS CDR3b AA-sequences that are in fact part of the same public response as the respective PUB-I TCRs. In total, 379 PUB-HS CDR3b nucleotide-sequences were present that also resulted in 379 PUB-HS CDR3b AA-sequences that differed by 1, 2 or 3 AAs from one of the 131 PUB-I CDR3b AA-sequences. This shows that 41% of the total virus-specific TCR-repertoire contained PUB-I and PUB-HS CDR3b nucleotide-sequences.
We investigated if these PUB-HS CDR3b AA-sequences were also present in individuals that did not contain the respective PUB-I CDR3b AA-sequences. PUB-HS CDR3b AA-sequences were present for 21 out of 29 PUB-I CDR3b AA-sequences ( Figure 3A; shaded orange bars). When we include the PUB-HS CDR3b AA-sequences and quantified the 29 PUB-I and PUB-HS CDR3b AA-sequences, these had a median prevalence of 50% among healthy individuals (range 23%-100%). The AdV-IE1 LLD and EBV-LMP2 ESE -specific T-cell populations, where PUB-I CDR3b AA-sequences were not found, did contain PUB-HS CDR3b AA-sequences in multiple individuals at high frequencies (43) (Supplementary Figure 5). The frequencies of PUB-I combined with PUB-HS CDR3b AA-sequences were relatively high within each virus-specific T-cell population of each individual ( Figure 3B). The frequencies of all PUB-I and PUB-HS CDR3b AA-sequences ranged from 0.1%-99.4% within the 19 different virus-specific T-cell populations with a median of 13.1%. When combined, all but one PUB-I plus PUB-HS CDR3b FIGURE 1 | Flowchart of included and excluded CDR3b nucleotide and AA-sequences. In total, 190 different virus-specific T-cell populations were FACsorted using pMHC-tetramers, followed by a short-term in vitro stimulation. The CDR3b nucleotide-sequences were determined using next-gen Illumina sequencing. CDR3b nucleotide-sequences that occurred at a frequency of less than 0.1% in each sample were excluded. CDR3b nucleotide-sequences that were identical and present in two different specificities, but present at high frequencies in one specificity, were only removed from the specificity that contained the sequences at very low frequencies (0.1-0.5%; n=135). CDR3b nucleotide-sequences that were identical and present in two different specificities at low frequency were considered contamination and removed from the library (82 sequences, 41 different-sequences). The numbers of different CDR3b AA-sequences that were encoded by the CDR3b nucleotide-sequence are shown at protein level. We then assessed how many CDR3b-AA-sequences were found in multiple individuals (shared) and how many were only found in a single individual (unique).

Huisman et al.
Public AA-sequences were found in at least 25% of individuals and 3 were even found in over 75% of individuals. These data show that for many PUB-I CDR3b AA-sequences we found sequences that were similar (1, 2 or 3 AA-differences), making up more than 40% of the total virus-specific TCR-repertoire and together these sequences were found in the majority of individuals at high frequencies.

Identical and Highly-Similar CDR3b AA-Sequences Contain Conserved Regions in the Junctional Region
To investigate how the PUB-HS CDR3b AA-sequences related to the PUB-I CDR3b AA-sequences, we analyzed if the PUB-HS (1, 2 or 3 AA-differences) CDR3b AA-sequences showed variations at random positions or at specific positions compared to the PUB-I CDR3b AA-sequences. We hypothesized that if the binding/ docking of PUB-HS TCRs was not significantly different, conserved regions and regions that allow for some variation could be identified in the CDR3b AA-sequences. As Illustrated in Figure 4A for the EBV-LMP2 FLY -specific PUB-I CDR3b AA-sequence CASSYQGGNYGYTF, two motifs were identified with AA-differences predominantly located at positions 5 and 9/10 of the CDR3b-region. The AAs [QGG] at positions 6-8 were conserved for both motifs. In total, the two motifs consisted of 86 PUB-HS CDR3b AA-sequences with 1 or 2 AA-differences. The majority (n=71) had the same CDR3 length of 14 AAs as the PUB-I CDR3b AA-sequence, implying that variations were caused by AA-substitutions. The remaining 15 PUB-HS CDR3b AA-sequences had a CDR3 length of 15 AAs, due to AA-inserts, compared to the PUB-I CDR3b AA-sequence ( Figure 4B). Similar rules were found for the other 20 PUB-I CDR3b AAsequences. Also here, some AA-positions were highly conserved, whereas others were variable. However, the precise locations of the variable AAs differed between specificities (Representative examples; Figure 4C). Interestingly, the corresponding CDR3alpha sequences of a few highly-frequent PUB-I and PUB-HS CDR3b-sequences were also identical or highly-similar between different individuals (Supplementary Table 4).
As a control, we assessed if these conserved motifs were predictive for the specificity when searching in our database of 2355 unique CDR3b AA-sequences. The requirement was that each motif should not be present in another specificity. We observed that some specificities contained motifs of only 3 or 4 AAs that were exclusive for that specificity and were not observed in any other specificity (Table 3). Altogether, these data show that the variations in the CDR3b AA-sequences were not random, but occurred at specific positions that resulted in conserved regions that were predictive for the specificities.

Computational Analysis Reveals Conserved Regions in CDR3b AA-Sequences Despite Using Different TRBJ-Genes
We hypothesized that if the conserved junctional region is a crucial part of the peptide-HLA binding, virus-specific TCRrepertoires could also contain CDR3b AA-sequences with the same conserved region, while allowing different TRBJ-gene usage, as long as the 3-dimensional conformation would allow this. Since the TRBJ-regions often differ by more than 3 AAs, we were not able to include these as PUB-HS CDR3b AA-sequences. Such PUB-HS CDR3b AA-sequences that use different TRBJgenes might even further increase the prevalence of PUB-I and PUB-HS CDR3b AA-sequences in the virus-specific TCRrepertoire. To investigate this, we performed a computational analysis using the levenshtein-distances (AA-differences) between all different CDR3b AA-sequences. For four different specificities (EBV-LMP2 FLY , EBV-EBNA3A RPP , AdV-E1A LLD , and AdV-HEXON TDL ) we observed clustering of CDR3b AAsequences that expressed the same TRBV-genes while using different TRBJ-genes. For example ( Figure 5A), the HLA-A*02:01-restricted EBV-LMP2 FLY -specific CD8 pos T-cell repertoire contained 2 clusters within the cluster of TRBV6-5expressing T cells (TRBV6-5/TRBJ1-2 and TRBV6-5/TRBJ2-1). The majority of CDR3b AA-sequences within the TRBJ1-2 cluster had a length of 14 AAs, while CDR3b AA-sequences from the TRBJ2-1 cluster had a length of 13 AAs ( Figure 5B). Analysis of the junctional regions of the TRBV6-5/TRBJ1-2 and TRBV6-5/TRBJ2-1-encoded CDR3b AA-sequences revealed strong conservation of AAs [QGG] on positions 6-8, despite different TRBJ-usage and CDR3 lengths ( Figure 5C). Similarly, the HLA-A*01:01-restricted AdV-HEXON TDL -specific CD8 pos T-cell repertoire contained two large clusters of CDR3b AAsequences, using TRBV20-1 or TRBV5-1 ( Figure 5D), all with a CDR3 length of 13 AAs. The first cluster (TRBV20-1) contained sub-clusters of CDR3b AA-sequences using TRBJ1-1, TRBJ2-3 or TRBJ2-7 and the second cluster (TRBV5-1) contained CDR3b AA-sequences using TRBJ2-1 or TRBJ2-7. AdV-HEXON TDLspecific CDR3b AA-sequences expressing TRBV20-1 revealed strong conservation of AAs [PGQG] on positions 4-7, which fell outside the region encoded by TRBJ ( Figure 5E). Additionally, AdV-HEXON TDL -specific CDR3b AA-sequences expressing TRBV5-1 revealed strong conservation of AAs [N:D] on positions 4 and 7, despite different TRBJ-usage. These examples illustrate that virus-specific TCR-repertoires can have conserved CDR3b-regions, while using different TRBJ-genes, allowing substantial variability at specific positions encoded by the TRBJ-region. This will further increase the prevalence of PUB-I and PUB-HS CDR3b AA-sequences in the total virusspecific TCR-repertoire.

Individuals With Heterozygous HLA Backgrounds Contain the Same Shared Identical and Highly-Similar CDR3b AA-Sequences
To determine whether the magnitude of PUB-I and PUB-HS CDR3b AA-sequences was particular for our cohort of individuals with a homozygous HLA background, we investigated if the same phenomenon was also present in individuals with a heterogeneous HLA background. We performed the same analyses on virus-specific CD8 pos T-cell populations targeting 11 different viral epitopes that were generated and used in the context of a clinical study (20). A total of 1157 CDR3b nucleotide-sequences could be correctly annotated. In total, 695 (61%) nucleotide-sequences resulted in unique CDR3b AA-sequences, that were only found in one individual, and 462 nucleotide-sequences (39%) resulted in 89 different PUB-I CDR3b AA-sequences. From the 695 unique CDR3b AA-sequences, 134 PUB-HS CDR3b nucleotidesequences were present that differed by 1, 2 or 3 AAs from one of the 89 PUB-I CDR3b AA-sequences. This shows again that also in this cohort a large part (51%) of the total virus-specific TCR-repertoire contained PUB-I and PUB-HS CDR3b nucleotide-sequences. Because the targeted viral epitopes were not fully identical in both cohorts, we could investigate the prevalence of 20 out of 29 PUB-I CDR3b AA-sequences in this cohort. In total, 17 out of 20 CDR3b AA-sequences that were previously identified, could also be identified in this independent cohort. When we included the PUB-HS CDR3b AA-sequences and quantified the 17 PUB-I and PUB-HS CDR3b AAsequences, these sequences had a similar high prevalence of a median of 89% among healthy individuals (range 26-100%).
( Figure 6A). These CDR3b AA-sequences were also present at high frequencies within each virus-specific T-cell population ( Figure 6B). These data show that the same PUB-I or PUB-HS CDR3b AA-sequences are also present in virus-specific T cells isolated from an independent cohort of individuals with a heterogeneous HLA background with a similar prevalence among donors and frequency within donors.

DISCUSSION
In this study, we quantitatively analyzed the magnitude, defined as prevalence within the population and frequencies within individuals, of public-identical (PUB-I) together with publichighly-similar (PUB-HS) TCRs in TCR-repertoires of CMV, EBV and AdV-specific CD8 pos T-cell populations. In total, 2224 (71%) TCR-CDR3b nucleotide-sequences resulted in unique CDR3b AA-sequences, and 905 nucleotide-sequences (29%) resulted in 131 different PUB-I CDR3b AA-sequences that were found in two or more unrelated individuals. These PUB-I CDR3b AA-sequences were distributed over 19 out of 21 virus-specificities and contained 29 different PUB-I CDR3b AAsequences that were often found in multiple individuals at high frequencies. The virus-specific T-cell populations additionally contained 12% PUB-HS CDR3b AA-sequences, which differed by 1, 2 or 3 AAs compared to the respective PUB-I CDR3b AAsequences. PUB-HS CDR3b AA-sequences could be found in virus-specific T-cell populations of individuals who did not contain the PUB-I CDR3b AA-sequence as well as of individuals who already contained the PUB-I CDR3b AA-sequence. Analysis of the PUB-I and PUB-HS CDR3b AAsequences revealed strong conservation of specific AA motifs in the junctional region together with variability of AAs at specific positions at the TRBV/TRBD-and/or TRBD/TRBJ-border regions. Positions with high variability were often adjacent to or even interspersed with the conserved motif. The conserved motifs that we identified were unique for each specificity, and could not be identified in any other specificity in our database. This makes it very likely that these motifs are important for binding of the TCRs to the peptide-HLA complexes. Combined, 41% of the total virus-specific TCR-repertoire consisted of PUB-I and PUB-HS CDR3b nucleotide-sequences. These findings were based on virus-specific T-cell populations derived from two homogeneous donor cohorts that homozygously expressed HLA-A*01:01/HLA-B*08:01 or HLA-A*02:01/HLA-B*07:02. However, we found similar high percentages (51%) of PUB-I and PUB-HS CDR3b nucleotide-sequences within virus-specific T-cell populations from healthy donors with heterogeneous HLA-backgrounds that were generated for a recent clinical study (20). These dominant PUB-I and PUB-HS TCRs probably are a reflection of the viral-antigen-specific T-cell responses that most optimally encountered the peptide-HLA complexes on the infected target cells and could be utilized for the design of future immunotherapy purposes including TCRgene transfer strategies. Various explanations have been suggested to underlie the development of public TCRs in T-cell responses targeting the same antigenic epitope (44). One was a high probability that these PUB-I sequences can be generated during V-D-J recombination (28,45). Furthermore, various nucleotidesequences can result in the same TCR AA-sequences that We investigated which regions of PUB-I or PUB-HS CDR3b AA-sequences were predictive for the specificity. We searched for each motif in our library of 2355 CDR3b AA-sequences to determine what part of the junctional regions were unique for each specificity, without being present in any other specificity. Underscores with an x represent any of the 20 AAs. Some motifs contain two possible AAs that can be part of the motif which are shown between brackets. The minimum motifs are also shown in bold font in the original CDR3b AA-sequences.
further increase the probability (46). Selection in vivo by optimal antigen-specific proliferation may result in a dominant antigenspecific memory T-cell population (47). These determinants may also lead to TCRs that are highly-similar to the PUB-I sequence, although they were often not included in the analyses of such public T-cell responses. It has been shown that conserved AAs in the CDR3 loop provide a structural framework that is required for the maintenance of the three dimensional TCR-structure (48). A similar structural framework between the PUB-I and PUB-HS sequences can thus lead to a conserved engagement with the peptide/HLA complex (49). Our rationale is that the PUB-I and PUB-HS sequences are part of the same public T-cell response when the same peptide-HLA complex is targeted, the same variable gene is expressed to have identical CDR1 and CDR2 regions and contains the same conserved AAs in the CDR3 loop. With this set of rules, we were able to quantitatively analyze the public T-cell responses and showed that T cells expressing PUB-I TCRs together with T cells that express PUB-HS TCRs made up at least 41% of the total TCR-repertoire. To assess the role of the alpha chains in PUB-I and PUB-HS TCRs, we identified the CDR3a sequence usage of a selection of virusspecific T-cell populations that contained shared TCRb sequences and the corresponding CDR3a sequences all showed to be identical or highly similar between individuals. However, the high percentages of shared TCR-sequences contradict the findings observed by Madi et al. (50), where they performed immunization in mice with foreign ovalbumin (OVA)-derived peptide that resulted in dominant private TCR-repertoires and less public TCRs (51). Since virus-specific memory T-cell repertoires in the circulation are shaped based on antigen encounter and subsequent proliferation, the PUB-I and PUB-HS TCR-sequences most likely reflect highly functional T cells capable antigen-driven proliferation. For latent viruses such as CMV, EBV and AdV, virus-specific T cells frequently encounter antigen during episodes of viral reactivation. The presence of PUB-I and PUB-HS TCR-sequences for these virus-specific T cells could be rather high due to this frequent antigen encounter. However, multiple reports also observed shared CDR3b sequences in T-cell populations specific for non-latent viruses such as Influenza, RSV, and SARS-CoV-2 (16,17,26), suggesting that this phenomenon is not unique for latent viruses, although the unexpected high magnitude of PUB-I and PUB-HS TCRsequences that we observed can be unique for latent viruses. These percentages of PUB-I and PUB-HS TCRs within these virus specific T-cell responses may still be an underestimation since the prerequisite of the identification of a PUB-HS TCR was similarity to a PUB-I TCR that was present in at least 2 individuals. Highly similar TCRs with only mutual similarities without identity in at least 2 individuals were not included as PUB-HS TCRs. Therefore, some of the unique TCRs within the virus-specific T-cell repertoire may also be part of a public T-cell response. This was indeed illustrated by the growing percentages of PUB-I and PUB-HS sequences when including more sampled sequences (51). Although it was suggested that HLA polymorphisms might be a confounding factor that affect the sharing of TCRs (50,52), we showed that our validation cohort with different HLA-backgrounds revealed frequencies of the PUB-I and PUB-HS TCRs with at least a similar magnitude. Our approach involved a short ex vivo expansion of the isolated virus-specific T cells that might have created a bias towards the expansion of the presently identified PUB-I and PUB-HS TCRs, indicating that the actual numbers of PUB-I and PUB-HS in unmanipulated peripheral blood may have even been higher.
In conclusion, our findings demonstrate that a large part of the virus-specific TCR-repertoire contains PUB-I and PUB-HS TCRs at high frequencies in multiple different individuals. Because virusspecific memory T-cell repertoires in the circulation are shaped based on antigen encounter and subsequent proliferation, the PUB-I and PUB-HS TCR-sequences most likely reflect highly functional T cells capable of antigen-driven proliferation. Since it is plausible that the highly-similar TCRs with conserved motifs similarly dock to the peptide-HLA complex as the identical shared TCR-sequences, these PUB-I and PUB-HS sequences can be considered part of the same public T-cell response. Such public TCRs may then be utilized for diagnostic purposes or therapeutic benefit in TCR-gene transferbased immunotherapy strategies to effectively control viralreactivation in immuno-compromised patients.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are publicly available. This data can be found here: https://www.ncbi.nlm.nih. gov/bioproject/PRJNA803981.